Body Fat Percentage Presentation

Group 013E01

Joanne Lim, John Fu, Haoyuan Gao, Zhenchen Yi

Data Description

  • Accurately body fat measurement is vital but often inconvenient and expensive
  • Can we use simpler measurements like height, weight, and circumferences for estimates?
  • The data set bodyfat contains body fat percentages and other related measurements for 250 men. These measurements, which include height, weight and various body circumferences, were collected to explore alternatives to underwater body fat assessments.
Rows: 250
Columns: 16
$ Density <dbl> 1.0708, 1.0853, 1.0414, 1.0751, 1.0340, 1.0502, 1.0549, 1.0704…
$ Pct.BF  <dbl> 12.3, 6.1, 25.3, 10.4, 28.7, 20.9, 19.2, 12.4, 4.1, 11.7, 7.1,…
$ Age     <int> 23, 22, 22, 26, 24, 24, 26, 25, 25, 23, 26, 27, 32, 30, 35, 35…
$ Weight  <dbl> 154.25, 173.25, 154.00, 184.75, 184.25, 210.25, 181.00, 176.00…
$ Height  <dbl> 67.75, 72.25, 66.25, 72.25, 71.25, 74.75, 69.75, 72.50, 74.00,…
$ Neck    <dbl> 36.2, 38.5, 34.0, 37.4, 34.4, 39.0, 36.4, 37.8, 38.1, 42.1, 38…
$ Chest   <dbl> 93.1, 93.6, 95.8, 101.8, 97.3, 104.5, 105.1, 99.6, 100.9, 99.6…
$ Abdomen <dbl> 85.2, 83.0, 87.9, 86.4, 100.0, 94.4, 90.7, 88.5, 82.5, 88.6, 8…
$ Waist   <dbl> 33.54331, 32.67717, 34.60630, 34.01575, 39.37008, 37.16535, 35…
$ Hip     <dbl> 94.5, 98.7, 99.2, 101.2, 101.9, 107.8, 100.3, 97.1, 99.9, 104.…
$ Thigh   <dbl> 59.0, 58.7, 59.6, 60.1, 63.2, 66.0, 58.4, 60.0, 62.9, 63.1, 59…
$ Knee    <dbl> 37.3, 37.3, 38.9, 37.3, 42.2, 42.0, 38.3, 39.4, 38.3, 41.7, 39…
$ Ankle   <dbl> 21.9, 23.4, 24.0, 22.8, 24.0, 25.6, 22.9, 23.2, 23.8, 25.0, 25…
$ Bicep   <dbl> 32.0, 30.5, 28.8, 32.4, 32.2, 35.7, 31.9, 30.5, 35.9, 35.6, 32…
$ Forearm <dbl> 27.4, 28.9, 25.2, 29.4, 27.7, 30.6, 27.8, 29.0, 31.1, 30.0, 29…
$ Wrist   <dbl> 17.1, 18.2, 16.6, 18.2, 17.7, 18.8, 17.7, 18.8, 18.2, 19.2, 18…

Null and Full Model


Call:
lm(formula = Pct.BF ~ ., data = bodyfat)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.3746 -0.3725 -0.1157  0.2358 15.0629 

Coefficients: (1 not defined because of singularities)
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.494e+02  1.154e+01  38.961   <2e-16 ***
Density     -4.098e+02  8.384e+00 -48.876   <2e-16 ***
Age          1.395e-02  9.721e-03   1.435    0.153    
Weight       1.527e-02  2.015e-02   0.758    0.449    
Height      -1.558e-02  5.752e-02  -0.271    0.787    
Neck        -1.653e-02  7.084e-02  -0.233    0.816    
Chest        1.790e-02  3.259e-02   0.549    0.583    
Abdomen      1.833e-02  3.286e-02   0.558    0.578    
Waist               NA         NA      NA       NA    
Hip          2.537e-02  4.391e-02   0.578    0.564    
Thigh       -2.107e-02  4.421e-02  -0.476    0.634    
Knee        -1.657e-02  7.366e-02  -0.225    0.822    
Ankle       -8.160e-02  6.616e-02  -1.233    0.219    
Bicep       -5.256e-02  5.132e-02  -1.024    0.307    
Forearm      1.405e-02  6.229e-02   0.225    0.822    
Wrist       -1.883e-02  1.640e-01  -0.115    0.909    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.276 on 235 degrees of freedom
Multiple R-squared:  0.9777,    Adjusted R-squared:  0.9763 
F-statistic: 734.4 on 14 and 235 DF,  p-value: < 2.2e-16

Null and Full Model

metric M0 M1
r.squared 0.98 0.00
adj.r.squared 0.98 0.00
sigma 1.28 8.29
statistic 734.37
p.value 0.00
df 14.00
logLik −407.98 −883.12
AIC 847.96 1,770.23
BIC 904.31 1,777.28
deviance 382.77 17,128.82
df.residual 235.00 249.00
nobs 250.00 250.00

Backward stepwise selection


Call:
lm(formula = Pct.BF ~ Density + Age + Abdomen, data = bodyfat)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.2913 -0.3576 -0.0911  0.2319 15.4601 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.424e+02  8.738e+00  50.626  < 2e-16 ***
Density     -4.065e+02  7.279e+00 -55.844  < 2e-16 ***
Age          1.182e-02  6.579e-03   1.796   0.0737 .  
Abdomen      5.761e-02  1.332e-02   4.326 2.21e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.26 on 246 degrees of freedom
Multiple R-squared:  0.9772,    Adjusted R-squared:  0.9769 
F-statistic:  3513 on 3 and 246 DF,  p-value: < 2.2e-16

Forward stepwise selection


Call:
lm(formula = Pct.BF ~ Density + Abdomen + Age, data = bodyfat)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.2913 -0.3576 -0.0911  0.2319 15.4601 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.424e+02  8.738e+00  50.626  < 2e-16 ***
Density     -4.065e+02  7.279e+00 -55.844  < 2e-16 ***
Abdomen      5.761e-02  1.332e-02   4.326 2.21e-05 ***
Age          1.182e-02  6.579e-03   1.796   0.0737 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.26 on 246 degrees of freedom
Multiple R-squared:  0.9772,    Adjusted R-squared:  0.9769 
F-statistic:  3513 on 3 and 246 DF,  p-value: < 2.2e-16

Summary of all models

  Full Backward Forward
Predictors Estimates p Estimates p Estimates p
(Intercept) 449.43 <0.001 442.38 <0.001 442.38 <0.001
Density -409.76 <0.001 -406.49 <0.001 -406.49 <0.001
Age 0.01 0.153 0.01 0.074 0.01 0.074
Weight 0.02 0.449
Height -0.02 0.787
Neck -0.02 0.816
Chest 0.02 0.583
Abdomen 0.02 0.578 0.06 <0.001 0.06 <0.001
Hip 0.03 0.564
Thigh -0.02 0.634
Knee -0.02 0.822
Ankle -0.08 0.219
Bicep -0.05 0.307
Forearm 0.01 0.822
Wrist -0.02 0.909
Observations 250 250 250
R2 / R2 adjusted 0.978 / 0.976 0.977 / 0.977 0.977 / 0.977
AIC 847.962 831.066 831.066

Model selection conclusion

  • Generally, higher values of R-squared are better, but a very high R-squared could suggest overfitting, especially if it is much higher than the adjusted R-squared.
  • Lower AIC values indicate a better-fitting model.
  • We can conclude from the table, backward selection model will be the most appropriate one as it has the lowest AIC values and its R-squared and adjusted R-squared is the same.

Check Linearlity

Age:

Check Linearlity

Density:

Check Linearity

Abdomen:

Check homoscedasticity

Check Normality of Residuals

Apart from three points in the upper tail and one point in the lower tail, the majority of points lie quite close to the line in the QQ plot. Hence, the normality assumption for the residuals is reasonably well satisfied.

Additionally, we have quite large sample size so we can also rely on the central limit theorem to give us approximately valid inferences.

Final fitted model


Call:
lm(formula = Pct.BF ~ Density + Age + Abdomen, data = bodyfat)

Coefficients:
(Intercept)      Density          Age      Abdomen  
  442.37549   -406.49296      0.01182      0.05761  
  • Fitted model: \[Pct.BF = 442.38 - 406.49 \times Density + 0.01 \times Age + 0.06 \times Abdomen\]
  1. On average, holding the other variables constant, a year increase in age leads to a 0.01 increase in body fat percentage.
  2. On average, holding the other variables constant, a 1cm increase in abdomen leads to a 0.06 increase in body fat percentage.
  3. On average, holding the other variables constant, a unit increase in density leads to a decrease of 406.49 in body fat percentage.

Cross Validation

This is the outcome of performance for our fitted model: \(Pct.BF = 442.38 - 406.49 \times Density + 0.01 \times Age + 0.06 \times Abdomen\).

Linear Regression 

250 samples
  3 predictor

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 226, 226, 224, 224, 225, 226, ... 
Resampling results:

  RMSE      Rsquared   MAE      
  1.004345  0.9770595  0.4931661

Tuning parameter 'intercept' was held constant at a value of TRUE

Summary of Cross Validation

Interpretation for Cross Validation

  • Smaller RMSE and MAE value indicates better fit to the data.
  • Higher values of R-squared are better.
  • Comparing the full model with 3 simple model:
  1. Simple_age and simple_abdomen has relatively higher MAE and RMSE value
  2. Simple_density has the lowest Rsquared value.

Therefore, full(\(Pct.BF = 442.38 - 406.49 \times Density + 0.01 \times Age + 0.06 \times Abdomen\).) will be the best model.